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Use of bypassing m a pipelined instructioii processor 



The invention relates to an instruction processing device witii a pqielined 
functional unit 

Data dependency imposes significant restrictions on the throughput of 
instruction processors. Instruction processors execute successive operations that require input 

5 operands and produce results. Operands are generally stored in a register file, fixmi which 
they are retrieved using operand addresses fix>m a command in an instruction. The result is 
stored in the register file using a result address of the command. When its operands are 
retrieved firom the register file the command caimot be executed until the operands haye been 
stored in the register file by preceding instructions. Thus a nodnimum delay between the 

10 commands is needed. This reduces efiBciency of the processor. In VLIW processors, for 
example, no-operations may have to be scheduled for functional units in some instruction 
cycles because insufficient operands are available. 

US patent No. 5,805,852 describes how a VLIW processor can be made more 
efficient by means of a bypass between a pipeline stage of a functional unit in which results 

15 are produced and a pipeline stage wherein operands are used. The bypass makes tiie result 
available as operand for a subsequent instruction without the delay necessary for storing the 
result in the register file and retrieving it as an operand fixnn the register file. 

During pipelined operation, a functional unit first generates the result in an 
execution stage of a pipeline, and stores the result in a pipeline register behind the execution 

20 stage. Subsequently the fimctional unit hands on the result through the pipeline until it has 
been stored in the register file. When a new command enters the pipeline of one of the 
functional units, its operand addresses are compared with the addresses of results that are still 
in the pipelines of respective ones of the functional units. When a match occurs the operand 
is taken firom the pipeline stage of the relevant functional unit rather than fmm the register 

25 file. 

In recent years the size of register files has tended to increase. Large register 
files have the advantage that they speed 19 execution because it is less fireq^ 
to wait until a register is available for reuse or to spill operands to memory. The price of the 
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larger register files has been an increase in power consunptiQa The register file now often is 
the major power consumer in a VLIW processor. 

Amongst others, it is an object of the invention to reduce the power 
consumption due to register files in an instruction processing device. 
5 The instmction processing device according to the invention is set forth in 

Claim. 1, Tlie invention is based on flie observation ^t_power can be saved.by not writing „ . 

results to a register fiDie when they do not need to be retrieved fixnn the register file, because 
they are used only via a bypass path. Prior to execution of instmctions, for example during 
compilations of the instructions, it can be determined whether it can be guaranteed that a 
10 result of an pperalion will be not used other than via a bypass. If so, bypassing of the result 
suffices and it is not necessary to store the result in the register file. By disabling storage into 
the register file in this case, power consumption is reduced. 

In an embodiment the instruction processing device contains a plurality of 
bypass registers that are selectively addressable with a register address from the command, 
IS for selecting a register for storing the result and/or for retrieving operand data. Thus, it is 
made possible to avoid writing to the register file more often* Typically there are &r fewer 
bypass registers than registers in the register file, so as not to slow down the instruction cycle 
duration. The result is written to a bypass register at a pipeline stage before it is written to the 
renter file. 

20 In an embodiment writing of the result is disabled by suppressing a supply of 

clock signals to circuitry for writing the result into the register fiDie. When the register file has 
a number of write ports writing is disabled at selected write ports, selected under control of 
the instructions. 

Preferably, bypassing is controlled by a bypass control unit that compares 
25 result register addresses firom the commands with operand register address of later commands 
causing substitution of a result from a bypass path in case of a match of the addresses. Thus, 
no special addresses are needed for bypassed results. In another embodiment the instmctions 
contain addresses to select between diiSerent bypass registers. 

In an embodiment a chain of registers is provided for supplying bypass 
30 operand data. Results shift through bypass registers in the chain in successive instruction 

cycles. The chain extends fiirther than necessary for writing the result into the second register 
unit The makes it possible more often to avoid power consumption for writing results to the 
register file. Bypass data from the registers in the chain may be selected by comparing 



PHNL030268EPP 



3 10.03,2003 
operand addresses with result register addresses, or using explicit register selection 
information fiom the instructions. The latter simplifies bypassing control circuitry. 

The invention is advantageously applied to processors such as VLIW 
processors, which contain a plurality of functional units that operate in parallel. Such 
processors require increasingly larger register files since more functional units operate in 
parallel. By suppressing writing to the register files considerable power consumption is 
saved. Preferably groups of bypass registers are provided, each for storing results firom a 
respective one of the functional units only, the registers of all groups being* addressable fixnn 
each command fiir retrieving operands. 

The invention also relates to a method of compiling programs, in which the 
conditicms for suppressing writing to the register file are detected, after which information is 
added to the instructions to suppress such writing. Detection involves testing whether results 
of instractions can be passed via a bypass path (this is mainly a matter of being used 
sufficientiy soon after production) and whether it can be guaranteed that these results will not 
be used later (e.g. by scanning the instructions to detect whether the result is not used again 
in any later reachable instruction before the register that contains the result is overwritten or 
before the end of the program). A ccmiputer program for executing such a method may be 
passed on any computer program product, such as a magnetic or optical disc, a semi- 
conductor memory module, a download signal etc. 

These and other objects and advantageous aspects of the invention will be 
described using the following figures. 

Figure 1 shows a pipelined processor 
Figure 2 shows part of a register file 
Figure 3 shows part of a pipelined processor 
Figure 4 shows part of a further pipelined processor 
Figure 5 shows part of a further pipelined processor 

Figure 1 shows an example of a simplified pipelined VLIW processor. The 
processor contains an instruction memory 10, a program counter 10a, an instraction register 
1 1, execution units 12, a register file 14 and a bypass control unit 1 6. By way of exainple, 
two execution units 12 are shown in parallel, but in practice more execution units may be 
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used. Each execution unit may contain a gtoup of fimctional units (not shown), or be a 
functional unit by itself Instruction register 1 1 has outputs for a plurality of commands fix)m 
an instruction, each command for a respective one of execution units 12, Each command 
contains a part for an opcode, a part for operand register addresses and a part for a result 
5 register address. The outputs of instruction register 1 1 for the operand register address parts 

of the commands are coupled to read ports of register file 14 and to operand register address 

inputs of bypass control unit 16. Usually, each command contains two operand addresses, but 
for the sake of clarity connections for only one operand (address) are shown. More than two 
operands are also possible for operations such as the multiply-accumulate. 

10 The processor is divided into successive pipeline stages, which are separated 

by means of pipeline registers. For each execution unit 12 the processor contains first stage 
pq)eline regist^ 120, 122, 124 and a multiplexer 123 between instruction register 1 1 and the 
execution unit 12. A first one of the first stage pipeline registers 124 stores flie opcode part of 
the command for the execution imit 12. A second one of the first stage pipeline registers 122 

IS stores operands of the command for the execution unit 12. A third one of the first stage 

pq)eline register 120 stores the result address of the command for the execution unit 12 and 
write control information. The first one of the first stage pipeline registers 124 has an input 
coiq>led to the outputs of instruction register 1 1 for the opcode part of the conmiand for the 
execution unit 12. The third one of the first stage pipeline re^sters 120 has an iapvA coupled 

20 to the outputs of instruction register 1 1 for the result address part of the command for the 
execution unit 12. 

The second one of the first stage pipeline registers 122 has an input coupled, 
via multiplexer 123, to the read port of register file 14 to which the operand address parts of 
the command for the execution unit 12 are supplied. In principle, there will be a respective 

25 multiplexer 123 and a respective second one of the first stage pipeline registers 122 with 
similar connections for each of respective one of the operands of the command for the 
execution unit, but for the sake of clarity only one multiplexer 123 and second one of the first 
stage pipeline registers 122 is shown. 

Second stage pipeline registers 126, 128 are included behind execution units 

30 12. A first one of the second stage pipeline registers 126 is coupled to the third one of the 
first stage pipeline registers 124, for receiving the result register address parts of commands 
and the write control information. A second one of the second stage pipeline registers 128 is 
coiq)led to a result ou^ut of execution unit 12. The fibrst and second ones of the second stage 
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pipeline registers 126, 128 are coiqpled to write ports of register file 14, for supplying results 
and corresponding result register addresses, as weU as the write control information. 

Multiplexers 123 each have an input coupled for receiving an addressed 
operand firom the read ports of register file 14, and for receiving bypass operands firom the 
5 second ones of the second stage bypass registers 128 via bypass paths 1 5. The third one of 
the first stage registers 124 and the first one of the second stage registers 126 pass operaiad 
register addresses and result addresses to bypass control unit 16 respectively. Bypass control 
unit 16 controls multiplexers 123 to determine which of their inputs is coupled to the second 
one of the first stage pipeline registers 122. 

10 In operation program counter 10a supplies a series of instraction addresses to 

instmction memory 10. In response to each instruction address instraction memory supplies a 
respective instraction to instraction register 1 1 . Each instruction contains commands for a . 
plurality of execution units 12. The conomands may contain operand register addresses of the 
operand or operands of the command The operand register addresses are supplied to read 

IS ports of register file 14. In response register file 14 supplies the addressed operands from the 
read ports. Normally, the operands are supplied to the relevant execution units 12 with the 
(optionally decoded) opcode part of the corresponding conimand. The execution units 12 
execute the commands using the operand and produce a result For an "ADD" command for 
exanotple, two operands are used and their sum is produced as a result The result is siq>plied 

20 to a write port of register file 14, together with a r^ult register address fi-om the command. 

Command execution is pipelined. This means tiiat a new instracticm starts in 
substantially every instraction cycle and that successive steps of execution of an instraction 
are executed in successive instraction cycles. For example in a first instraction cycle the 
instraction memory are addressed, in a second instruction cycle the operands are fetched, in a 

25 third instraction cycle the command is executed proper by the execution unit 12 and in a 
subsequent instraction cycle the result is written to register file 14. Thus, execution of an 
instraction takes a number of instraction cycles. The respective parts of the processor thiat 
execute the instraction in respective successive instraction cycles are called pipeline stages. 
In a particular instraction cycle the different pipeline stages process different instractions in 

30 different stages of execution. The different pipeline stages are separated by pipeline registers 
to separate the information of the diff^nt instractions under process. Instraction cycles are 
indicated by a clock (not shown) which control takeover of information fix>m a preceding 
pq)eline stage into the pipeline register at the end of each instruction cycle. 
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Pipelined processors are known per se. It will be appreciated lhat the figure 
shows only one simple embodiment Without deviating from the invention many alternative 
pipelined architectures may be used, in which different tasks are distributed differently over 
different pipeline stages and that additional pipeline stages may be added. 

Bypass paths 15 serve to reduce the need to wait with execution of an 
instraction until its qperandsMye been stored in register file 14. If a first instruction . . ... 
produces a result fbat is used as operand by a second instruction, tiie result can be passed to 
CTecution unit 12 via bypass paths IS before the result has been written to register file 14. 
Bypass control unit 16 compares the operand register address of the second instruction witii 
the result register address of the first instruction. When a match occurs bypass control unit 16 
controls multiplexer 123 to pass the result fix)m pipeline register 122 instead of a result from 
register file 14. This makes it possible to execute tiie second instmction sooner. 

Bypassing itself is known per se. It should be appreciated that many forms of 
bypassing exist and lhat the form shown in figure 1 is but one example, to which the 
invention is not limited- For example, the results v^ed for the bypass may come firom 
different pipeline stages than the one shown (e.g. directiy from execution unit 12 to first stage 
pipeline register 122, or from a later stage (not shown)), or .from more than one pipeline 
stage. Conespondingly, result register addresses may come from different stages than the one 
shown. Such addresses or comparison results may be stored in bypass control unit 16 to 
pipeline con^arison as welL In addition a code may be present in the instructions to indicate 
whether bypassing should be used or not This makes it possible to avoid bypassing when an 
instraction should use an "old" value fit>m register file 14. 

According to the invention write back of the result to register file 14 can be 
disabled under control of the instructions. This will be done when the result is used only via 
the bypass path 15 and not from register file 14. During compilation of a program of 
instmctions a guaranteed last instruction of the program is determined that uses a result for 
each execution path that can be followed (if applicable via branch instructions). It will be 
^preciated that some execution paths may in feet never be reached during execution, but if 
this cannot be proven at compile time for some path, the last instructions that uses the result 
if that path is followed should be determined. If any last instraction that has been determined 
is scheduled to be executed so shortiy after the instraction that produces the result that it can 
receive the result via bypassing, Ihe conopiler adds disable information to the instraction that 
produces the result. This indicates that the instraction need not be stored in regist^ file 14. 



PHNL030268EPP 



7 10.03.2003 
The disable infoimation is passed down the pipeline with the result register address to 
register file 14 to disable storage of the result in register file 14. 

Figure 2 shows an example of a multiport register file with multiport registers 
20 that have plural read ports 28. Data supply circuits 22 and addressing circuits 24 are 
coupled to multiport registers 20 for supplying result data and re^ster selection signals 
respectively. Data siq>ply circuits 22 have inputs RESl, RES2 for results. Addressing circuits 
24 have hspats ADDRl, ADDR2 for result register addresses. Clock enable circuits 26 are 
coiq>led between a clock iiiput CLK and clock terminals of data supply circuits 22 and 
addressing circuits 24. Clock enable circuits 26 are used to disable supply of clock signals to 
data supply circuits 22 and addressing circuits 24 under control of disable inputs DISl, DIS2. 

In operation clock pulses fixxcn clock enable circuits 26 cause data supply 
circuits 22 and addressing circuits 24 to drive supply of new data and selection signals to 
multiport registers 20. Infomiation from the instructions is supplied to disable inputs DISl, 
DIS2 for selectively disabling su^ly of clock signals for using data and addresses fix)m those 
ports RESl, ADDRl, RES2, ADDR2 from which results are siqjplied for which it has been 
detemiined that no further use will be made of the results in the program. As a result power 
consumption by the data supply circuits 22 and addressing circuits 24 is reduced. 

It will be ^ypreciated that multiport register files are known per se. Many 
alternative architectures may be used without deviating fix>m the invention. According to the 
invention die register file is arranged to disable selected parts of the register file. Instead of 
disablmg both data supply circuits 22, and addressing circuits only one of these circuits may 
be disabled and/or any other circuit that consumes power when driven to update register 
content. 

Figure 1 shows that the disable information for a result is passed along with 
the result register address of a command, e.g. in the form of an additional bit from the 
command This requires a minimal amount of modification of the processor to disable writing 
of the result However, it will be appreciated that many alternative solutions exist for 
indicating selective disabling. For example, the information may be contained at the level of 
the instruction rather than at the level of individual commands, by encoding for example the 
position of one or more commands in the instmction for which disabling is allowed. In feet, 
the disable information need not even be contained in the same instmction as the relevant 
command. Instead it may be contained in an earlier or lalBr instruction that is known to be 
executed at a defined pipeline delay with respect to the command. Similarly, the information 
to disable may be contained in the opcode of a command, in which it may be decoded 
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anywhere in fhe pipeline for ultimate supply to the disable input of a write port of register file 
14. In yet another embodiment the information to control disabling may be siq>plied from an 
operand register. In this embodiment, data from an addressed operand register is passed along 
the pipeline stages to register file 14 to control disabling. 
5 Although only a single result register 126 has been shown for each execution 

junit it will be appreciated that the invention is not limited to such a single register. , . . 

Figure 3 shows part of a processor with a plurality of result registers 32. Two 
operand inputs are shown for execution unit 12. The result output of execution unit 12 is 
coupled to a result register 128 in parallel to a number of furthw result registers 32. Result 

10 register 128 is coupled to circuitry further down the pipe-line (not shown). A register select 
unit 30~has an input coupled to a first stage result register address register 124 and select 
outputs coupled to select inputs of further result registers 32. The outputs of further result 
registers 128 are coupled to inputs of an operand selection unit 34, which in tum has oulputs 
coupled to multiplexers 123. 

15 In operation the processor stores selected results in selected ones of the furthCT 

registers 32. First stage result register address register 120 receives the result register address 
and the disable information for disabling writing to register file 14. Register select unit 30 is 
enabled when the disable information indicates that writmg to register file 14 will be 
disabled Li this case, register select unit 30 uses part of the result register address to select 

20 one of the further result registers 32. The result fiom execution unit 12 is written into the 
selected farOiet result register 32. Subsequently, when an operand address is received that 
indicates lhat a result from a further result register 32 should be used as operand, bypass 
control unit 1 6 signals operand selection unit 34 to select the output of one of finther result 
registers 32 on the basis of the operand address. The result is then passed to execution unit 12 

25 as operand from operand selection unit 34 via multiplexer 123. 

It should be appreciated that the number of fiirther result registers 32 is much 
smaller than the nvmiber of registers in register file 14. This makes it possible to include these 
further result registers 32 at the end of the pipe-line stage that contains execution unit 12 
without a large time penalty. Thus, a limited number of results can be stored at an earlier 

30 pipeline stage than the pipeline stage in which results are normally stored in register file 14. 
These results can be made available as operands firom further result registers 32 well before 
they could have been made available firom register file 14. 

It should also be appreciated that figure 3 merely illustrates an exan^le of how 
furth^ result registers 32 may be used. Many alternatives are possible. For example, further 
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result legisteis 32 mig^t be included in a subsequent pipeline stage, i.e. following result 
register 128, or in more than one pipeline stage. The former may be the case for example 
when inclusion of the further result registers at the executions stage would require an 
extension of the duration of the instruction cycle. When different execution units 12 are used 
S in parallel, different sets of further result registers 32 may even be included at differrat 
pipeline stages. As another example, selection of the result registers 32 might be 
implemented differently. For example, a dedicated address part might be included in 
instructions to select a fiirther result register, or on the contrary regist^ address values that do 
not address physical registers in register file 14 might be used to select ftirther registers 32. 

10 As another example a mode register might be used to select between using and not using 
ftirther result registers 32. Also selection information may be part of the opcode of 
commands, instead of coming from the result register address field. 

Similarly, various solutions are possible for determining when a result fix>m a 
further result register 32 should be used instead of data from register file 14. In one 

1 5 embodiment operand addresses contain a special indication to indicate that a result from a 
further result register 32 should be used. 

Figure 4 shows an alternative embodiment wherein address matching is used. 
For each further residt register 32 a respective register address register 40 is provided that 
stores the result register address for the result stored in fisrther result register 32. Bypass . 

20 control unit compares operand register addresses with the result register addresses fix>m 
register address registers. When a match occurs bypass control unit 16 signals operand 
selection unit 34 and multiplexers 123 (shown in Figure 3) to substitute a result firom further 
result registers 32 for operand data from register file 14. Preferably, bypass control unit 16 
also compares result register address^ fix>m the pipeline register 124 with stored result 

25 register addresses from register address register 40 and resets a register address registers 40 
when it contains Ihe register address of specified in a command. Thus it is ensured that ai 
result from register file 14 will be used subsequentiy for that register address (unless of 
course a new result is stored in fiirther result registers 32 subsequently). It will be appreciated 
that further result registers 32 with register address registers 40 form an embodiment of a 

30 simple associative memory. Oiher types of associative memories may be used as an 
alternative. 

Although it has been assumed in the preceding that the instractions contain 
information to indicate which of the further result registers 32 should be used to store a 
particular result, it should be qypredated that instead it may sufBce to indicate merely that 
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the particular result and its result register address should be stored in one of the further result 
registers 32 and its corresponding further result register address register 40, and optionally 
for how many instruction cycles the result should be stored, or how many times it will be 
read. In this case, an automatic register assignment unit may be provided to assign flie result 
to any j&ee one of the further result registers 32 (a further register may be recorded as free 
after apredetennined numb of instruction cycles (or the optionally progratmned number) _ 
has passed, or after the result has been read (or read tiie programmed number of times)). 
Although useful, ftirflier result register address register 40 is not strictly necessary for this: at 
compile time it is possible to predict which further result register 32 will be used and a 
selection address for tiiat further result roister 32 may be included in the operand 
specification. 

Altiiough only one execution unit 12 was shown in figures 3 and 4 for the salce 
of clarity, it will be understood that in practice a plurality of execution units 12 may be used 
in parallel to execute different commands from an instruction. In this case, each execution 
unit 12 may be provided with its own set of further result registers 32. Bypass control unit 16 
controls whether and from which set results are substituted for operand data from register file 
14. As an alternative, a shared set of further result registers 32 may be used. In this case the 
instructions should contain information to indicate which, if any, of the ececution units 12 
should write to -v«^ch further result register' 32 (and, if necessary to its corresponding result 
register address register 40). 

It will be s^ipieciated tiiat writing into further result registers can make it 
siq>erfluous to write into register file 14. By including information in Ihe instructions to 
selectively write results into further result registers 32 and to disable writing of a result into 
register file 14 overall pow^ consumption can be reduced. Of course, writing into further 
result registers also consumes power, but because the number of these further results registers 
32 is smaller than that in register file 14 less power is consumed. The size of register file 14 
can therefore be extended without a severe power consumption penalty. As shown in figure 
3, the same signal is used to signal disabling of writing into register file 14 and to enable 
writing into further result registers 32. This helps to reduce instruction size. As a result, 
results are written either in further result registers 32 or in register file 14. However, it is also 
possible to used independentiy settable control information for disabling writing into register 
file 14 and for selecting finiher result registers 32. Thus, under program control a bypass of a 
result from fiirther result registers can be selectively combined with either or not longer term 
storing in register file 14, as needed for a particular program. 



PHNL030268EPP 



11 10.03^003 
Figure 5 shows part of another embodiment of a processor. Compared to the 
embodiment of figure 1 in the embodiment of figure 5 a number of finrther result register 
address registers 50 and fiirther result registers 52 have been added downstream fi^om result 
register address registers 126 and result registers 128. Further result register address registers 
50 and fiirther result registers 52 have outputs coupled to bypass control unit 16. In operation 
results of execution of commands and result register addresses of fliese results are passed 
down flie pipeline, even after they the instmction cycle fixMn which they could be retrieved 
fiom register file 14 if writing is enabled. Bypass control unit 16 controls whe&er results 
fix)m further result register address registers 50 and finl^ 

fix>m register file 14. This may be done on the basis of register operand/result address 
comparison or using explicit information firom the instruction. Use of fijrther result register 
address registers 50 and fiirther result registers 52 makes it possible to disable writing of a 
result into register file 14 when that result is needed only within a predetermined number of 
mstruction cycles after it has been generated. When this is the case, information is added to 
the instractions to disable writing of the result into register file 14. Thus, power consumption 
for writing into register file 14 can be saved 

In an embodiment storing of a register address and a result in finlher result 
register address registers 50 and fiirther r^ult registers 52 is disabled if the result is stored in 
register file 14. Thus, power is saved when storing in register file 14 is not disabled In this 
embodiment additional registers for disable information are included in parallel wilh fiirther 
result register address registers 50. The disable information is coiqpled to fiirther result 
register address registers 50 and fiirther result registers 52. The disable information is used to 
disable updating of the content of fiirther result register address registers 50 and fiirther result 
registers 52 when the disable mformation indicates that writing of the result into register file 
14 has not been disabled In this case, the disable information may also be used to indicate to 
bypass control unit that the operand data should be retrieved from register file 14 instead of 
fiirther result register address registers 50 and fiirther result registers 52. * - 

It will be understood that any number of one or more fiirther result register 
address registers 50 and fiirther result registers 52, not only two, may be included in series, 
permitting results to be bypassed as operands. It will also be understood that additional result 
register address registers 50 and result registers 52 may be present in the pipeline between the 
result register address register 126 and die result registers 128 that follow execution unit 12 
on one hand and fiirther lesult register address registers 50 and fiirther result registers 52 on 
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the other hand when there are additional pipeline stages behind execution unit Bypass 
control unit 16 uses results jGrom these stages for bypass as well. 
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CLAIMS: 



1. An instmction processing device comprismg 

an instruction issue unit for issuing successive instructions; 

a plurality of pipe-line stages coupled to the instruction issue unil^ at least one 
of Ihe pipe-line stages comprising a functional unit for executing a connnand from the 
instructions; 

a first register unit coupled to the functional unit for storing a result of 
execution of the command when the command has reached a first one of the pipeline stages, 
and for supplying bypass operand data to a circuit in a pipe-line stage preceding the first one 

of the pipeline stages; 

a second register unit, coupled to the ftmctional unit for storing Ihe result when 
the command has reached a second one of the pipeline stages, downstream from die first one 
of the pq[)eline stages, and for supplying operand data to the ftmctional unit; 

adisable circuit coupled to selectively disable storing of the results in the 
second register unit under control of the instructions. 

2. An instruction processing device according to Claim 1, wherein the first and 
second register unit each comprise a plurality of registers and addressing circuitry for 
selective addressing with a register address from the command, for selecting a register for 
storing the result and/or for retrieving operand data. 

3. An instruction processing device according to Claim 2, wherein the first 
register unit contains fewer registers than the second register unit. 

4. An instruction processing device according to Claim 2, wherein the disable 
circuit is arranged to suppress a siq>ply of clock signals to circuitry for writing the result into 
a register of the second register unit fix)m a write port of the second register unit 

5. An instruction processing device according to Claim 3, comprising a plurality 
of functional units, arranged to execute respective commands from an instruction in parallel. 
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fhe second register unit having a plurality of write ports for writing the result fiom respective 
ones of fhe fimctional unit, Ifae disable circuit being arranged to disable writing at selected 
write ports, selected under control of the instructions. 

5 6. An instruction processing device according to Claim 2, comprising a bypass 

control unit arranged to compare a result register address for Ihe result from a first, one of the 
commands with an operand register address from a second one of the commands lhat follows 
the first one of the commands direcfly or indirectly, and to substitute a result from the register 
of the first register unit that contains the result for an operand from the second register unit in 
10 caseof a match of the addresses. 

7. An instruction processing device according to Claim 1, wherein the first 
register unit comprises a chain of registers for siipplying bypass operand data, arranged as a 
shift register with an input coupled to a result output of the first one of the stages and 

15 operative to shift the result through successive shift register stages in successive instruction 
cycles, at least if storing of the result in the second register unit is disabled, the chain 
extending ftulher than necessary for writing the result into the second register unit 

8. An instruction processing device according to Claim 7, wherein the registers 
20 in the chain are addressable from Ihe commands. 

9. An instruction processing device according to Claim 2, comprising a plurality 
of functional units, arranged to execute resspective commands from an instraction in parallel, 
the first register unit comprising respective groiq>s of registers, each for storing results finm a 

25 respective one of the fimctional units only, the registers of all groups being addressable from 
the command for retrieving an operand. 

10. A method of executing a program of instructions in an instruction processor, 
the method comprising 

30 - pipelining execution of commands from the instructions; 

in the absence of instmction to the contrary storing results of the commands in 

a register file; 

in the absence of instruction to the contrary retrieving register sourced 
operands of the commands from the register file; 
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selectively using a first one of the results bypassed fix)m a pipelining stage as a 
bypassed operand instead of at least one of the register sources qpeiands from the register 
file; . 

selectively suppressing, under program control, writing of the first one of the 
5 results to the register file. 

11. A method according to Claim 10, comprising writing the first one of the 
results into an addressable one of a plurality of bypass registers that are located to receive the 
result earlier during pipelining than the regist^ file. 

10 

12. A computer program product comprising instructions for an instruction 
processor for iniplementing the method according to Claim 10 or 1 1 . . 

13. A method of compiling a program of instructions for an instruction processor, 
IS the method comprising 

generating a series of instructions; 

first detecting for a result to be produced by a first one of the instructions 
which second one of the instructions use the result as operand; 

second detecting whether it can be guaranteed that it will be possible to bypass 
20 the result in the instruction processor as operand for all second ones of the instructions 
without retrieving Hie result fix)m a register file; 

generating information in the instruction to disable writing to the register file 
whea it can be guaranteed tiiat it will be possible to bypass the result as operand in the 
instruction processor for all second ones of the instructions^ 

25 

14. A method of compiling according to Claim 12, comprising including an 
indication in the instructions that the result should be stored in one of a plurality of bypass 
registers that is addressable on writing and/or reading of the result to the plurality of bypass 
registers. 

30 

15. A computer program product comprising instructions for an instruction 
processor for implementing the method according to Claim 13 or 14 
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ABSTRACT: 



An instruction processing device has a of pipe-line stage with a functional unit 
for executing a command firom an instruction. A first register tmit is coupled to the functional 
unit for storing a result of execution of tbe cwnmand when the command has reached a first 
one of the pipeline stages, and for supplying bypass opraand data to the functional imit A 
register file is coupled to the functional unit for storing the result when Ihe command has 
reached a second one of the pipelme stages, downstream firom the first one of the pipeline 
stages, and for siq>plying operand data to the functional unit A disable circuit is coupled to 
selectively disable storing of the results in the register file under control of the mstructions. 



Fig.l 



PHNLD3a268 



1/4 





FIG. 3 



PHNL030268 



3/4 



16 




FIG. 4 



CO 



4/4 



s 



&0 



CO CM 

CM — 




LO 

CD 




This Page is inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the 
original documents submitted by the applicant. 

Defects in the images include but are not lunited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLORED OR BLACK AND WHITE PHOTOGRAPHS 
GRAY SCALE DOCUMENTS 

LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REPERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning documents will not correct images 
problems checked, please do not report the 
problems to the IFW Image Problem Mailbox 



